A novel hypothesis-generating approach for detecting phenotypic associations using epigenetic data

Aim: Hypotheses about what phenotypes to include in causal analyses, that in turn can have clinical and policy implications, can be guided by hypothesis-free approaches leveraging the epigenome, for example. Materials & methods: Minimally adjusted epigenome-wide association studies (EWAS) using ALSPAC data were performed for example conditions, dysmenorrhea and heavy menstrual bleeding (HMB). Differentially methylated CpGs were searched in the EWAS Catalog and associated traits identified. Traits were compared between those with and without the example conditions in ALSPAC. Results: Seven CpG sites were associated with dysmenorrhea and two with HMB. Smoking and adverse childhood experience score were associated with both conditions in the hypothesis-testing phase. Conclusion: Hypothesis-generating EWAS can help identify associations for future analyses.


Background
Epigenome-wide association studies (EWAS) have been widely used in epidemiology over the last decade to explore biomarkers and etiologies of health traits and disease [ 1 ].This explosion in use is due to both the volume and nature of the epigenetic data available to r esear chers, thanks to advances in bead-based microarray t echnology t o measur e lev els of DNA methylation (DNAm) at individual CpG sites [ 2 ].The epigenome is a dynamic sy st em of mit otically her itable mar kers that can contr ol gene expr ession without changing the underlying genetic sequence; it can be altered by environmental exposures associated with a multitude of phenotypes [ 3 ].Traditional DNAm-based EWAS analyses aim to identify phenotype-CpG associations, under the assumption that these associations may either be causal (i.e., the CpG causes the phenotype or vice versa ) or r epr esent confounding in a w ay tha t could still be useful for indicating (historical) exposures or predicting future out c omes [ 4 ] (e.g., methyla tion a t AHRR can be used to identify current and former smokers, and is predictive of lung cancer, even in the absence of causal mediation [ 5 ]).DNAm at specific CpGs can be thought of as phenotypes, because their v aria tion have both a genetic and an environmental basis [ 6 ].This feature means that, even in the absence of any causal epigenetic relationship, DNAm data may be useful for identifying (potentially causal) associations between other, non-epigenetic phenotypes, which can be follo wed b y more causally motiv a t ed analy ses.This kind of hypothesis generation is particularly useful for under-r esear ched phenotypes and conditions where limited previous literature is available to guide hypothesisdriv en appr oaches.
Menstrual health is one such example of an underr esear ched ar ea.For r easons r elat ed t o entrenched gender inequalities and menstrual stigma [ 7 ], ther e ar e still knowledge gaps around the risk factors and consequences of experiencing problematic menstrual symptoms like menstrual pain (dysmenor r hea) and heavy (or prolonged) menstrual bleeding (HMB), both of which are difficult to quantify and diagnose.How ev er, these ar e c ommon sympt oms that affect a large proportion of the menstrua ting popula tion throughout the life course that may impact on day-to-day life [ 8 ]; these symptoms can also be considered importan t indica tors of other domains of health and w ellbeing [ 9 ].The pr evalence of these in adolesc enc e is estimat ed t o be between 43% to 93% for dysmenor r hea [ 10 ] and 37% for HMB [ 11 ].Behaviours such as smoking and characteristics such as body mass index (BMI) have been linked inc onclusively t o dy smenor r hea, with a number of -but not all -studies reporting associations [12][13][14][15].The picture is even less clear for HMB outside comorbidities such as ovulatory dysfunction and coagulation disorders [ 16 ], with potential links to high BMI, smoking and alcohol consumption [ 17 ].
In this study, we aimed to use a minimally adjusted EWAS to leverage confounding and identify phenotypes that may be associated with dysmenor r hea and HMB as example conditions, where associations were t est ed in the wider cohort data in a later phase.In this paper, we use the term phenotype to refer to any non-genetic characteristic tha t migh t be a poten tial risk factor.The appr oach, in short, inv olv ed: identifying condition-CpG associations by running EWAS in the Avon Longitudinal Study of Parents and Children (ALSPAC) among those (G1) adolescents with epigenetic data; looking up identified CpGs in an online repository of published EWAS results (EWAS Catalog [ 18 ]) to identify phenotypes associated with those CpGs (i.e., generating hypotheses about phenotypic associations); and testing those hypotheses in the full ALSPAC cohort to explore associations between our conditions (dysmenor r hea and HMB) and the EWAS Catalog phenotypes (i.e., testing hypotheses).The two conditions under investigation here are understudied and prev alen t, with little known about modifiable risk factors, thus making them useful case studies for proposed utility of the hypothesis-generating EWAS approach.

Study population
ALSPAC is a longitudinal birth cohort study that recruited pr egnant w omen betw een 1990 and 1992 in the previous area of Avon, in the Southwest of England [ 19 , 20 ] .The initial sample c onsist ed of 14,541 preg nancies enrolled in the first phase; a further 906 pr egnancies w er e added at subsequen t recruitmen t phases, leading to 14,901 index childr en r emaining in the cohort after 1 year [ 19 , 20 ] .A subset of 1,018 ALSPAC mother-child pairs, selected based on the availability of DNA samples, was included in the Ac c essible Resourc e for Int eg rative Epidemiolog ic Studies (ARIES) study, which generated DNA methylation da ta a t three timepoin ts: birth, childhood and adolesc enc e [ 21 ].
The initial eligibility cr iter ia for this study w er e female index children (G1 participants) who had responded to at least one of the nine "G1 puberty" questionnaires sent between the ages of 8 and 17 years, specifically the section pertaining to menstruation (whether it had started, whether there were issues associated with it , etc.).A mong those who fulfilled the initial elig ibility crit eria, we then identified those who had participated in ARIES and had methyla tion da ta (described belo w).T he exclusions are summarised in the flow diagram ( Figure 1 ).
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committee.Informed consent for the use of data c ollect ed via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.Consent for biological samples was c ollect ed in ac c ordanc e with the Human Tissue Act 2004.For data collection from participants at 22 years old and onwards, study data w er e c ollect ed and managed using REDCap (Resear ch Electr onic Data Captur e) electr onic data capture tools hosted at the University of Bristol [ 22 ].The ALSPAC study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool ( http:// www.bristol.ac.uk/ alspac/res earchers/our-data/) [ 23 ].

Definition of menstrual symptoms
In each of the nine G1 puberty questionnaires, the participants' car egiv er (age 8-13) or participant (age 14 and over) was asked "Have you/your daughter ever had any of the following symptoms associated with your/her period: Sev er e cramps?" and "Heavy or prolonged bleeding?".Answering "Yes"to each question prompted a question asking if a doctor was contacted for the symptom.Other than at age 15 for cramps, we did not have selfr eported sev erity f or either symptom, so f or the purpose of the current study, we used "ever having visited the doct or" t o define dy smenor r hea or HMB.Those who nev er r eport ed t o have experienc ed the sympt om (or having only reported mild cramps at age 15 for dysmenorrhea) w er e desig nat ed c ontrols.The menstrual healthrela ted da ta in ALSPAC is summarised in detail elsewhere [ 24 ].

Identifying novel traits assoc ia ted with menstrual symptoms
The approach is detailed in Figure 2 and below.

Hypothesis-generating phase
Generation and preparation of the DNA methylation data for ARIES is described in detail in the ARIES Data Resource Profile [ 21 ].Briefly, DNA methylation in peripheral blood samples obtained from 1018 adolescents (aged 15 or 17 years old) was measured using the Illumina Infinium R HumanMethylation450K BeadChip assay [ 21 ].This array con tains probes tha t can measure the methyla tion of over 450,000 CpGs located throughout the human genome, driven by DNA bisulfite conversion [ 25 ].The level of methylation is given as a beta-value ( b ), ranging from not methylated ( b = 0) to fully methylated ( b = 1) [ 26 ].Data obtained from the array were pre-processed, normalised and quality controlled using the R package meffil [ 27 ].We r emov ed 11,648 pr obes that mapped to either the X or Y chromosomes (due to lack of information available in the EWAS Catalog as sex chromosomes removal is c ommon practic e in EWAS analy ses), 901 SNP and c ontr ol pr obes and 3853 probes that had a high detection p -value.Following the removal of these probes, samples with outlying methylation values w er e identified using Dysmenorrhea HMB

Hypothesis-generating phase
Generating hypotheses about potential associated phenotypes with dysmenorrhea or heavy menstrual bleeding (HMB) Figure 2. Our methodology for identifying novel phenotypes associated with dysmenorrhea and heavy menstrual bleeding in adolescents using a hypothesis-generating epigenome-wide association study approach.Created with BioRender.com.
the Tukey method (outside the 25th and 75th percentiles ±thr ee-times the inter quartile range) [ 28 ] and r emov ed for individual probes.To ac c ount for cr oss-r eactiv e pr obes and polymorphic CpGs, as guided by Chen et al ., w e retained all remaining probes and then checked the results against their list so as not to misinterpret spurious signals [ 29 ].In the final analysis, 470,334 probes r emained .

Epigenome-wide association studies: hypothesis-generating phase
The r elationships betw een methylation and menstrual symptoms w er e explor ed using a cr oss-sectional design within ALSPAC (i.e., both DNA methylation and menstrual symptoms w er e measur ed a t ages tha t overlapped).Association between dysmenor r hea/HMB case/con trol sta tus and v aria tion in methyla tion w as assessed using linear r egr ession.Case status (dysmenor r hea or HMB any time during adolesc enc e) w as trea ted as the exposure in the EWAS, with methylation included as the out c ome, sinc e our aim was not to identify causal associations.Most dysmenor r hea ( n = 65) and HMB ( n = 46) cases reported to have experienced the symptom prior to their methylation being measur ed , either at 15 or, for the majority of ARIES participants, 17 y ears old .The small number of participants ( n = 5) whose methylation was measured before the first reporting of their menstrual symptom were dealt with in sensitivity analysis (described below).Models w er e purposefully simple: w e adjusted for age a t methyla tion measuremen t (given tha t the adolescen t peripheral blood samples w er e c ollect ed at either age 15 or 17) and surroga te v ariables (SVs) genera t ed t o capture t echnical bat ch effects only.SVs w er e generated using SV analysis (SVA) and the number of SVs to generate was estimated as part of the SVA pipeline, based on the dataset ( nSV = 24 and nSV = 33 for dysmenor r hea and HMB EWAS, r espectiv ely).All analyses w er e performed using the R package meffil (which draws on the R package isva to genera te surroga te v ariables) [ 27 ] using R version 3.6.3.To enable the methyla tion da ta to capture v aria tion in a wide range of other traits, we performed no further adjustment, e.g., for cell counts or hormonal contraception.Although adjustment for cell composition is standard in other epigenetic studies where CpG methylation is a cause of inter est, w e did not w an t to mask any generated hypotheses about cell composition, such as immune dysregulation, by adjusting for it.

EWAS catalog look up
From each EWAS (of dysmenor r hea and HMB), we selected differentially methylated CpGs with a p -value < 1 × 10 -5 and performed a look-up of these CpGs, or the genes they mapped to, in the EWAS C atalog [ 18 ].T he EWAS Catalog is a repository of phenotype-CpG associa tions iden tified thr ough published EWAS.We cr eated a list of phenotypes associated with these CpGs and their resident gene; although pleiotropy was present for some CpGs, an enrichment analysis was not deemed necessary due to the hypothesis-generating nature of the analysis.

Hypothesis-testing phase in full alspac sample
We used the ALSPAC data dictionary to find data on c orresponding CpG-associat ed phenotypes in ALSPAC, assessed during gestation (for prenatal exposures) or prepuberty (for childhood traits).Pre-puberty measurements of traits w er e chosen t o det ermine whether the identified phenotype preceded the onset of these symptoms, and thus may be a candidate for future testing in causal analyses in other datasets as potential risk factors for either menstrual symptom.Further details of this process are described in the Supplementary Material .To explore the phenotypes identified in the look up phase, we performed logistic regressions with each identified phenotype included in turn as the exposure and the menstrual sympt oms (dy smenor r hea or HMB) in ALSPAC adolescents as the outcome, with and without adjustment for socioeconomic position (SEP) and age at menarche (AAM).SEP has been shown to be associated with both menstrual symptoms [30][31][32][33] and most phenotypes; younger AAM is a risk factor for both menstrual symptoms [ 34 ] and is likely associated with several factors in adolesc enc e included in the hypothesis-testing phase.Con tinuous v ariables w er e conv erted to standar dized zscor es befor e running these analy ses t o enable c omparison of effect estimates for variables on different scales.Phenotypes where numbers of cases and/or controls contained fewer than five participants were omitted from the analysis due to inadequate power.
To test previously iden tified associa tions between phenotypes such as gynec olog ical and endocrine disorders, socioeconomic position, contraception and age at menarche and menstrual symptoms, we also included these alongside the series of logistic r egr essions of novel associations.

Sensitivity analyses
We ran the EWAS analysis for each symptom again removing cases of thyroid problems (self-reported at 17 years), poly cystic ovary syndr ome (PCOS) and endometriosis (self-reported at 22 years) to limit the effect these conditions might have had on the findings.We also replicated the hypothesis-testing phase among participants who had reported the symptom during puberty but had not visited the doctor, to identify whether any characteristics w er e associated with a less "sev er e" pr esentation of dysmenor r hea or HMB and excluding those who first reported either symptom after their methylation was measured for ARIES.We investigated use of oral contraception in the hypothesis-testing phase rather than excluding them given the high prevalence of use among adolescents.

Results
Of the original ALSPAC cohor t, 7284 par ticipants w er e female (49% of the children who were alive at 1 year old) and 4222 of these participants responded to at least one of the questionnaires sent out during puberty, stating whether they had started their period.Of these, 487 individuals had DNA methyla tion da ta a t adolesc enc e in ARIES ( Figure 1 ) and were included in the hypothesisgenerating phase (QQ plots available in Supplementary Figures S1 & S2 ).In the hypothesis-testing phase, we performed c omplet e case analy ses, so the denominat or differed by regression model depending on the missing data for each phenotype ( Table 1 ).

Dysmenorrhea
We identified seven differentially meth ylat ed CpG sit es ( p < 1 × 10 -5 ) ( Table 2 ) in adolescents who suffered from sev er e dysmenor r hea compared with those who did not.None of these CpGs are represented in the Chen list of cr oss-r eactiv e or polymorphic probes.DNA methylation at these CpG sites was associated with 9 phenotypes, and the genes they sit in w er e associated with a further 22 phenotypes in the EWAS Catalog look-up.

HMB
In the EWAS of HMB, we identified two differentially methylat ed CpG sit es with p < 1 × 10 -5 ( Table 3 ).Sim- ilarly to dysmenor r hea, neither of these CpGs are represented in the Chen list.When we performed a lookup of these CpGs, as well as the genes they mapped to, in the EWAS Catalog, we identified 10 associated phenotypes.

Hypothesis-testing phase
From the list of phenotypes that w er e associated with diff erentially meth ylated CpGs in the hypothesis-generating phase ( Tables 2 & 3 ), we derived the following variables from ALSPAC: maternal educational attainment, mater- Probe ID denotes the differentially methylated CpG site.β r epr esen ts the e xten t of methylation, with r espectiv e p -values.Pr obe position r epr esents the chr omosomal location of the CpG site, while CpG tr aits denotes tr aits that have been identified to be associated with that CpG site.The gene column shows in which gene the CpG site in question resides, with gene traits r epr esenting traits that have been shown to be associated with CpG sites located within that gene.Probe ID denotes the differentially methylated CpG site.β r epr esen ts the e xten t of methylation, with r espectiv e p -values.Pr obe position r epr esents the chr omosomal location of the CpG site, while CpG traits denotes traits that have been identified to be associated with that CpG site.The gene column shows in which gene the CpG site in question resides, with gene traits representing traits that have been shown to be associated with CpG sites located within that gene.
nal smoking and alcohol consumption during pregnancy, ma ternal BMI, ma ternal hypertensive disorders of pregnancy (HDP), maternal pre-eclampsia, participant gesta tional age a t delivery, participan t BMI, cotinine and cholester ol lev els at age 7, participant non-w or d r epetition score (measure of c og nition) at age 8, participant C-r eactiv e pr otein (CRP) at age 9, participant cigarette and alcohol use at age 13, participant adverse childhood experienc e (ACE) sc or e by 16 y ears old [ 46 ] and oral c ontrac eption use during puberty (derived from the G1 puberty questionnaires that asked specifically about oral c ontrac eption, detailed in the Supplementary ).Of these, participant smoking and BMI had been identified a priori as associated with both menstrual symptoms.We then compared these variables for cases and controls for each condition in the wider ALSPAC cohort ( n = 4,222) Other characteristics such as kidney disease, primary Sjögren's syndr ome, melanoma, Cr ohn's disease and rheumatoid arthritis w er e av ailable, but cases and/or con trols for each sympt om c ontained few er than fiv e participants so w er e omitted from the analysis as described in the methods.We found that, compared with unexposed participan ts, participan ts who had been exposed to smoke prenatally and who had smoked or drunk alcohol by age 13 w er e mor e likely to report dysmenor r hea in the unadjusted models ( Figure 3 ).Higher BMI, CRP, cotinine and ACE score was also associated with an increased likelihood of reporting dysmenor r hea ( Figure 4 ).Following adjustment for SEP and AAM, these effects att enuat ed, except for smoking at 13 years and ACE score at 16 years (aOR 1.61 95% CI: 1.11-2.33 and aOR 1.30 95% CI: 1.11-1.53,r espectiv ely ( Supplementary Table S1 ).
In the unadjusted HMB models, participants exposed to smoke and ma ternal HDP prena tally w er e mor e likely to report HMB during puberty; smoking and drinking alcohol by the age of 13 was also associated with an increased likelihood of reporting HMB ( Figure 3 ).Higher BMI, cotinine and ACE score was positively associated with HMB in the unadjusted models ( Figure 4 ).Following adjustment for SEP and AAM, these effects att enuat ed, exc ept for smoking at 13 years and ACE score at 16 years (aOR 2.35 95% CI: 1.69-3.26and aOR 1.35 95% CI: 1.15-1.57,r espectiv ely) ( Supplementary Table S2 ).
Additionally, although all c onfidenc e int ervals crossed the null, dysmenor r hea (but not HMB) was c onsist ently associated with higher gestational age at delivery with r elativ ely large effect estimates (aOR 1.41 95% CI: 0.97-2.06)( Figure 4 ).

Sensitivity analysis
In the sensitivity analysis, where we ran the same EWAS analysis for each symptom with cases of thyroid prob-lems at 17 years and PCOS or endometriosis at 22 years r emov ed , four hits from the primary analysis persisted for dysmenor r hea ( Supplementary Table S3 ) and one hit persisted for HMB ( Supplementary Table S3 ).The effect estimates in the sensitivity analysis all followed the same direction as the estimates from the primary analysis, and on the most part str engthened , except for cg04737758 which a ttenua ted tow ard the null.We then ran the hypothesis-testing phase in the cohort with these cases r emov ed .The dir ection of the associations with all traits didn't change and the conclusions made from the primary analysis didn't change ( Supplementary Figures S3 & S4 ).
When we performed the hypothesis-testing phase in less sev er e cases of each symptom, most associations a ttenua ted tow ar d the null .Associa tions tha t w er e stronger in the sensitivity analysis of less sev er e cases (exposure t o mat ernal pre-eclampsia and c otinine at age 7 for HMB and gestational age at birth for dysmenorrhea) w er e in the same direction as the primary analysis ( Supplementary Figures S5 & S6 ).
Of the dysmenor r hea cases, < 5 participants reported pain for the first time after their methylation measuremen t w as taken.For HMB, < 5 report ed prior t o methylation measurement.When these were excluded from the hypothesis-testing phase, our conclusions did not change ( Supplementary Figures S7 & S8 ).

Discussion
In this study, we c orroborat ed previously identified associations (i.e., BMI and smoking) and generated new hypotheses about phenotypes (i.e., ACEs and alcohol consumption) that may contribute to the development of adolesc ent dy smenor r hea and HMB .These h ypotheses w arran t further investiga tion in a causally motiv a ted framew ork, as what w e pr esent her e is pur ely associational .We w er e able to r eplicate pr eviously r eported associations with own smoking [ 12 , 47 ] and higher BMI [ 15 , 48 ] with both menstrual symptoms, guided by methylation markers which supports its utility as a hypothesisgenerating approach.We believe that this is the first study t o present evidenc e tha t tha t ear ly life exper iences such as ACEs and prenatal exposures such as maternal smoking are associated with these conditions.The iden tifica tion of both previously identified associations, as well as novel condition-phenotype rela tionships, suggests tha t the use of a hypothesis-generating EWAS approach may be useful to identify associations for future causal inference work.tions they identify between a trait of interest and differen tially methyla ted CpGs may either be causal (i.e., the CpG/gene causes the trait), r epr esent a historical exposure (i.e., flags someone as a smoker, for example), or highlight confounding.All these potential explanations for associations are useful when thinking about disease etiology and future analyses [ 4 ].Their findings can focus subsequent causal analyses and can be implemented in scenarios where epigenetic data, as well as rich phenotypic data, are available.Genome-wide association studies (GWAS) are useful in such scenarios but only focus on genetics, whereas EWAS allows us to leverage confounding by incorporating exposures throughout the early life tha t migh t be associa ted with la t er life c onditions.In the c ont ext of exposome-wide association studies (ExWAS), EWAS has been employed to reduce exposome dimension and make efficiency gains [ 49 ], reflecting the intention in this present study but here, to improve efficiency in subsequent observational, non-WAS analyses.In order t o t est the combination of minimally adjusted EWAS with the hypothesis-testing phase for identifying associa tions, we w an t ed t o use example c onditions where few associations have been previously confirmed, so that ther e w ould be scope to identify novel traits.ALSPAC has r epeat measur es of the pr esenc e of dy smenor r hea and HMB throughout adolesc enc e.In the literature, known causes and risk factors (outside diag nosed gynec olog ical problems) are scant, with some evidence suggesting smoking [ 47 ] and BMI [ 15 , 48 ] are associated with these conditions.

Novel findings
Having performed two EWAS, one for dysmenor r hea and the other f or HMB , we identified seven and two differen tially methyla t ed CpG sit es, r espectiv ely.The sev en dysmenor r hea CpG hits and their resident gene regions w er e associated with 31 individual traits, including nega tive associa tions with smoking (previously iden tified) and alcohol consumption, child abuse and pre-eclampsia (no vel).T he two HMB CpG hits and their resident gene r egions w er e associat ed with t en individual traits including smoking (previously identified) and gestational age, t otal cholest er ol and pr e-eclampsia (nov el).In the hypothesis-testing phase, we identified that smoking and alcohol consumption at age 13, as well as ACE score at age 16 were all associated with dysmenorrhea and HMB, including after adjustments for SEP and AAM.Although causal effects cannot be inferred from these analyses, they provide evidence that further investiga tion in t o these traits may be able t o illuminat e mechanisms by which they are associated with dysmenor r hea and HMB.

Strengths & limitations
We present a potentially useful, epigenetic-based approach that can be implemented by leveraging confounding to identify phenot ype-phenot ype associations even on a small scale, provided there is sufficient ac c ess to epigenetic and phenotypic data.The small number of cases and controls limit our c onc erns that our findings may be a result of multiple testing.Given that the hypothesis-testing phase relies on a minimally adjusted EWAS analy sis, a full c omplement of c onfounders is not r equir ed , thus participants ar e less fr equently excluded for not having c omplet e c ov aria te da ta, which is particularly useful if case and control numbers are small.The G1 puberty questionnaires in ALSPAC were sent out multiple times allowing us several timepoints across adolesc enc e within which to identify cases for our conditions.Despite cord blood methylation being available in ALSPAC, we chose to use adolescent DNA methylation because we w an t ed t o identify exposur es acr oss the life c ourse t o da te tha t may be associa ted with the developmen t of these conditions, additionally to genotypic differences.
A major limitation in this study was power, which was low particularly compared with other EWAS.How ev er, pr ior wor k leveraging small number of participants in epigenetic analyses has driven hypotheses in other fields; for example, a small hypothesis-generating EWAS of paternal smoking identified offspring DNA methylation that might be associated with development [ 50 ] that was carried forward by another group investigating drivers of childhood autism [ 51 ].Sample size was also a problem in the hypothesis-testing phase, as some phenotypes w er e underpow er ed in ALSPAC.This further r einfor ces the need for causally motiv a t ed analy ses in other c ohorts.
It is likely that larger genes may have been overr epr esented by the approach we t ook t o identify phenotypes given that we investigated phenotypes associated with both the differentially meth ylated CpG and its resident gene.We chose to accept this limitation on the basis that it widened the net we were casting for generated hypotheses.Our approach relies on previous r esear ch that is published in the EWAS C atalog.T herefore, it is only able to identify associations with phenotypes where some epigenetic research has been previously c onduct ed.Additionally, the EWAS Catalog is biased toward heavily investigated phenotypes, such as smoking, so such phenotypes w er e mor e likely to be identified in the hypothesis-generating phase than less frequently investigated phenotypes.In line with our hypothesisgenerating appr oach, w e opt ed t o ex trac t all EWAS Catalog phenotypes associated with the genes we identified in our menstrual EWAS.How ev er, if futur e studies are concerned about biases in the EWAS Catalog, an enrichment analysis (e.g., Fisher's Exact Test) could be used to identify those phenotypes that have an unusually large r epr esentation in the list of phenotypes associated with the candida te CpGs iden tified in the discov ery EWAS, r elativ e to their r epr esen ta tion in the en tire Ca talog.Mor e br oadly, most EWAS, including our own, are biased by the coverage of genes on the arrays used to obtain epigenetic data.
Misclassification of case status was a c onc ern, g iven the mix of car egiv er and adolescent responses used to ascertain cases of each symptom; how ev er, ther e was minimal disagr eement betw een answ ers giv en at multiple timepoints.Although multiple testing burden was high ( ∼470K probes) compared with other hypothesisgenerating approaches, like phenome-wide wide association studies (PheWAS), data are not appropriately processed in ALSPAC for PheWAS.Thus, we propose the use of our approach where data are not coded up for PheWAS in the presence of epigenetic and phenotypic da ta.Temporality w as well established in the hypothesisgenerating phase, whereby the majority of those defined as a case in the EWAS reported the first instance of their symptom prior to their methylation being measur ed , how ev er this was not as simple in the following phase for phenotypes and sympt oms.Despit e our best efforts to maintain sensible temporality in the hypothesis-testing phase, sometimes w e w er e not able to deriv e phenotype v ariables (exposure) tha t w er e definitiv ely befor e the condition onset (out c ome).For example, the ACE sc ore variable used to explore the potential association between child abuse and the worked example conditions was a c omposit e variable of ACEs up to the age of 16, given as a score [ 46 ].It is likely that some of the participants in the hypothesis-testing phase analysis of ACE score may have had ACEs tha t con tributed to their score at age 15 or 16, where the onset of their dysmenor r hea was ear lier in puberty.How ev er, as w e w er en't doing any causal analy ses, merely att empting t o iden tify associa tions between potential risk factors and menstrual conditions, temporality wasn't of utmost importance to uphold.
The findings from our hypothesis-testing phase of dysmenor r hea and HMB w er e only in ternally v alida ted in the wider ALSPAC cohort, as opposed to in another cohort; replication is crucial to draw further inference from these ten ta tive findings.It is important that future studies investigating the potential associations identified here do so in other independent cohorts where menstrual health da ta are av ailable.It is likely that some characteristics explored in the hypothesis-testing phase are highly corr elated; futur e studies where temporality of menstrual symptoms and potential risk factors is well-established would be best-placed to investigate these in more detail.The associations w e observ e in the hypothesis-testing phase may be c omplet ely mediat ed or modified by the much higher prevalence of hormonal contraception use in those with either condition; it is important to investigate the role of contraception for each association separately in future analyses.As Sawyer et al. point out, conditioning on hormonal c ontrac eption use might introduce collider bias if our phenotype of interest and condition are likely to influence contraception use [ 24 ].Finally, we ar e awar e that the definition for sev erity for each condition (those who had visited a doctor for the symptom) may instead reflect socioeconomic, cultural and personal fact ors for c er tain par ticipan ts tha t will have influenced why they sought medical advice for their symptoms while others did not, which has been highlighted in other fields [ 52 ].

Conclusion
We used an epigenome-led approach t o generat e hypotheses r egar ding pot ential risk fact ors, using dy smenor r hea and HMB as example phenotypes.T he no vel appr oach used her e, lev eraging both a hypothesisgenerating and -testing phase, as well as confounding relationships, det ect ed both known and novel associations between menstrual symptoms and environmental or phy siolog ical exposures.T his no vel approach could be added to the arsenal of exploratory analyses that drive h ypotheses f or future causal analyses in a range of understudied health problems, including menstrual health epidemiology.

Background
• DNA methylation can be explored in an epigenome-wide association study (EWAS) con te xt to identify causal mechanisms or c onfounding rela tionships betw een exposur es that can alter the epigenome and phenotypes of interest.• In a minimally adjusted EWAS acting as hypothesis-generating, associations can be identified that r epr esent either one of these relationships to be further explored in causally motivated analyses for conditions that are understudied.• In the present study, we demonstrated the utility of a hypothesis-generating EWAS approach followed by a hypothesis-testing phase using logistic r egr ession, inv estigating two understudied conditions as examples: dysmenorrhea (painful periods) and heavy menstrual bleeding (HMB).

Ma t erials & methods
• We used the Avon Longitudinal Study of Parents And Children (ALSPAC) to identify cases of adolescent dysmenorrhea and HMB; those with epigenetic data from ARIES were included in two hypothesis-generating EWAS where each condition served as the exposure.• The hypothesis-generating phase consisted of two minimally adjusted EWASs of dysmenorrhea and HMB to identify differentially methylated CpGs.• CpGs identified in the hypothesis-generating phase w er e then searched in the EWAS Catalog to find associated traits with the CpG and its resident gene.• The hypothesis-testing phase inv olv ed taking identified traits and investigating their association with dysmenorrhea and HMB in the wider ALSPAC c ohort .

Results
• Having found seven differentially methylated CpGs for dysmenorrhea and two for HMB, we searched them in the EWAS Catalog and identified phenotypes associated with each of them.• In the hypothesis-testing phase, we proxied the phenotypes found in the EWAS Catalog using variables from ALSPAC and included them in minimally adjusted logistic r egr ession models where each condition served as the outcome.Discussion • Using this appr oach, w e found that smoking and alcohol use at age 13 was associated with dysmenorrhea and HMB; higher cotinine levels at age 7 was associated with HMB.Higher adverse childhood experience (ACE) score was associated with both conditions.• We identified several potential targets of investigation for future r esear ch into risk factors for dysmenorrhea and HMB.Although tempor ality w as not easily established in the present study and causality indeterminable, we leverage confounding to guide future causally motivated analyses in other cohorts with menstruation data.

Figure 3 .
Figure 3. Coefficient plot r epr esenting binary phenotypes associated with dysmenorrhea and heavy menstrual bleeding in the hypothesis testing phase.* Identified as an associated trait a priori .

Figure 4 .
Figure 4. Coefficient plot r epr esenting continuous phenotypes associated with being a dysmenorrhea or heavy menstrual bleeding case in the hypothesis testing phase.* Identified as an associated trait a priori .

Table 1 .
Number of each binary ( n , %) and continuous characteristics (mean, standard deviation) in cases and controls for each symptom identified in the hypothesis-generating EWAS and missing data in each variable, where G0 refers to mums and G1 refers to adolescents.

Table 2 .
Differentially methylated CpG sites identified in the hypothesis-generating EWAS of dysmenorrhea.

Table 3 .
Differentially methylated CpG sites identified in the hypothesis-generating EWAS of heavy menstrual bleeding.